Differential Human Learning Optimization Algorithm

Human Learning Optimization (HLO) is an efficient metaheuristic algorithm in which three learning operators, i.e., the random learning operator, the individual learning operator, and the social learning operator, are developed to search for optima by mimicking the learning behaviors of humans. In fact, people not only learn from global optimization but also learn from the best solution of other individuals in the real life, and the operators of Differential Evolution are updated based on the optima of other individuals. Inspired by these facts, this paper proposes two novel differential human learning optimization algorithms (DEHLOs), into which the Differential Evolution strategy is introduced to enhance the optimization ability of the algorithm. And the two optimization algorithms, based on improving the HLO from individual and population, are named DEHLO1 and DEHLO2, respectively. The multidimensional knapsack problems are adopted as benchmark problems to validate the performance of DEHLOs, and the results are compared with the standard HLO and Modified Binary Differential Evolution (MBDE) as well as other state-of-the-art metaheuristics. The experimental results demonstrate that the developed DEHLOs significantly outperform other algorithms and the DEHLO2 achieves the best overall performance on various problems.

Human beings are the smartest creature in the world because of their strongest learning ability; they are smarter than other living beings, such as birds, ants, and fish. To solve complex problems effectively, humans are always repetitively learning to improve their skills for adapting to the external environment better. Many human learning activities are similar to the search process of metaheuristics. For example, when a person learns something new, he or she repeatedly practices to improve new skills and evaluates his or her performance for guiding the following study. e process of human learning just like the metaheuristic algorithms iteratively generates a new solution and calculates the corresponding fitness for adjusting the following search. erefore, it is reasonable to consider that the metaheuristic algorithm based on the human learning mechanisms may have advantages over other biological systems-based algorithms on complicated problems. Inspired by this thought, Wang et al. [34] proposed the Human Learning Optimization Algorithm (HLO) based on a simplified human learning model, in which three learning operators, i.e., the random learning operator (RLO), the individual learning operator (ILO), and the social learning operator (SLO), are developed to search out the optimal solution, which represents that a person may learn randomly due to the lack of prior knowledge or exploring new strategies, learn from his or her previous experience, and learn from his or her friends and books, respectively.
To strengthen the search efficiency of HLO, a few enhanced variants have been subsequently developed. An adaptive simplified human learning optimization algorithm (ASHLO) [35] is proposed in which the pr and pi, two control parameters determining the rates of performing RLO, ILO, and SLO, are linearly adjusted to achieve the balance between the global search and local search. Encouraged by the success of ASHLO, a sine-cosine adaptive human learning optimization algorithm (SCHLO) [36] is proposed in which the pr and pi are dynamically tuned in a reasonable range by the sine and cosine functions so that SCHLO can efficiently escape from the local optimal. Later, a new improved adaptive human learning optimization algorithm (IAHLO) [37] is presented to accurately tune the control parameter pr so that IAHLO can keep the diversity better at the early stage and perform the local search more efficiently at the later stages of iterations. Besides, inspired by the intelligence quotient (IQ) of humans, a diverse human learning optimization algorithm (DHLO) [38] is presented in which the control parameter pi is initialized by a Gaussian distribution and dynamically adjusted according to the pi value of the best individual. To further extend HLO, a novel hybrid-coded HLO (HcHLO) [39] is proposed to tackle mix-coded problems, in which realcoded parameters are optimized by a new continuous HLO (CHLO) [39] and the binary and discrete variables are handled by the binary learning operators of HLO. Until now, HLO has been successfully applied to engineering design problems [37], knapsack problems [40], optimal power flow calculation [41], extractive text summarization [42], financial markets forecasting [43], furnace flame recognition [44], scheduling problems [45], and intelligent control [46]. In particular, HLO obtained the best-so-far results on two wellstudied sets of multidimensional knapsack problems, i.e., 5.100 and 10.100 [40], as well as the set of mixed-variable optimization problems [39] which implies the promising advantages of HLO.
In HLO, social learning adopts the greedy strategy to generate a new candidate, i.e., simply yet efficient copying the bit value from the SKD, which makes the algorithm easy to fall into local optimal. So, the relearning operator is introduced into HLO [40] to help the algorithm to escape from the local optimal. However, the relearning operator may destroy the existing optimal information, which further reduces the performance of the algorithm. On the other hand, the social learning of the HLO just learns from the global solution, which is inconsistent with the actual society. In real life, people could learn from the best solution of other individuals in the population. e Modified Binary Differential Evolution (MBDE, modified binary DE which is the previous work) [47] reverses the updating strategy of the standard Differential Evolution (DE) [7] so that DE can better keep the robustness of parameter settings and the diversity of the population to search for optimal bit information effectively. erefore, this paper proposes two novel differential human learning optimization algorithms (DEHLOs), in which the strategy of MBDE is introduced into HLO to further improve the performance of DEHLOs algorithm by using the optimal information of other individuals. is paper is organized as follows. Section 2 gives a brief review of the HLO and MBDE, respectively. Section 3 presents the concepts, operators, and implementation of the proposed DEHLO1 and DEHLO2 in detail. Section 4 verified that the proposed DEHLOs have significant advantages over the compared algorithms on the multidimensional knapsack problems. Finally, conclusions are drawn in Section 5.

Human Learning Optimization.
e HLO adopts the binary-coding framework, and consequently an individual in HLO is represented by a binary string as where x i denotes the i-th individual, N is the size of the population, and M is the dimension of solutions. Each bit of binary string is initialized as "0" or "1" randomly. Random learning operator: At the beginning of the learning process, people always keep exploring new strategies to solve problems because there is no prior knowledge [48]. Besides, an individual cannot fully replicate their previous experience and social knowledge because of the disturbance of external and forgetting. To emulate these phenomena of human random learning, the HLO executes 2 Computational Intelligence and Neuroscience random learning operator (RLO) with a certain probability as where r 1 is a stochastic number between 0 and 1. Individual learning operator: Individual learning is defined as the ability to build knowledge through individual reflection about external stimuli and sources [49], which could be regarded as individual behavior in the trial and error process of continuous improvement. To mimic human individual learning, the best individual solutions are reserved in the individual knowledge database (IKD) as where IKD i denotes the individual knowledge database of the person i, K is the predefined number of solutions saved in the IKD, and ikd ip represents the p-th best experiment of the person i. When HLO conducts the individual learning operator, (4) is operated to generate a new candidate solution.
x ij � ik ip,j .
Social learning operator: During social learning, people can acquire knowledge and experience from other individuals to further develop their ability directly or indirectly [50], and the efficiency and effectiveness of learning will be improved from experience share [51]. To simulate the social learning of humans in HLO, the social knowledge database (SKD) is adopted to reserve the best knowledge of the population as where S is the size of the SKD and skd q is the q-th solution in the SKD. q is a stochastic number; it decides which one of the SKD will be used. HLO performs social learning operator as (6) to generate the new candidate solution during the search process.
x ij � sk qj . (6) In summary, the above operators can be integrated and operated as where r is a stochastic number between 0 and 1, and pr and pi are the control parameters to determine the rates of HLO performing the three learning operators. Specifically, pr, (pipr), and (1-pi) are the probabilities of random learning, individual learning, and social learning, respectively. Algorithm 1 describes the implementation of HLO, and more details can be found in [35].

Modified Binary Differential
Evolution. e MBDE [47] adopts the binary-coding scheme and reserves the updating formulas of the standard DE, including the mutation operator, the crossover operator, and the selection operator. A probability estimation operator is introduced into MBDE to integrate the mutant operator.
Probability estimation operator: e probability estimation operator is used to build the probability distribution vector f(p G i ) of the parent individuals. e new mutant binary individual u ′ G ij is generated from parents' sampling randomly through the probability estimation vector as equations (8) and (9), where F is the scaling factor and b denotes the bandwidth factor which is a positive real constant; p G r1,j , p G r2,j , and p G r3,j are the j-th bits of three randomly chosen individuals of G generation. rand is random number; u ′G ij is the mutation of the current target individual according to the probability estimation vector f(p G ij ). Crossover operator: e crossover operator is used to produce the trailing individual by mixing the target individual and its mutant individual in MBDE. e trail vector v ′G+1 ij can be obtained as where v ij ′ is the element of the trailing individual v i ′ and CR is the crossover probability ranged (0,1). e rand is a stochastic number uniformly distributed within (0, 1); rand i is a random integer with 1, 2, . . . , N where N is the length of the individual.

Computational Intelligence and Neuroscience
Selection: e selection operator is defined as the following equation: As shown in (11), the MBDE reserved the selection operator of the standard DE. e trail individual v i replaces the target individual x i if its fitness value is better. Otherwise, the target individual is reserved for the next generation.

Differential Human Learning
Optimization Algorithm e three operators of HLO represent human learning randomly, learning from their own experience, and collective experience. However, people could learn from other excellent individuals in actual life. e operator of Differential Evolution (DE) is updated based on the optimal information of other individuals in the population. Inspired by this thought, this paper proposes the differential human learning optimization algorithm (DEHLO), in which the learning strategy of the MBDE is introduced into the HLO to develop a novel probability estimation operator for generating the offspring individuals. And this paper modified the HLO from two levels, i.e., individual and population, and named DEHLO1 and DEHLO2, respectively.

DEHLO1.
During the real learning process, different teams always adopt different strategies to search for the optimal solution for the same complex problem. To emulate the phenomena of dividing into groups, the operators of HLO and MBDE are utilized to generate the new solution in DEHLO1, so that the DEHLO1 algorithm could obtain the performance of HLO and MBDE. In DEHLO1, half of the population is updated by using the operator of HLO as (7) to generate a new solution, and the rest of the population is updated by using the mutation operator of MBDE as equations (8)-(10) to acquire the new individual. e DEHLO1 algorithm could possess both the advantages and shortcomings of the HLO and MBDE, and a dynamic competition strategy is used in DEHLO1 to avoid the disadvantages of the HLO and MBDE. At the beginning of a search, the population is divided into two equal parts which adopt the strategy of HLO and MBDE, respectively. With the progress of the search, the optimal fitness of the HLO and that of MBDE are compared under the specified iterations, and the individual proportion of better fitness value corresponding algorithm will be increased while the individual proportion of the other algorithm will be decreased correspondingly. erefore, the DEHLO1 algorithm can adaptively compete and use the optimal learning strategy to search for the optimal solution, which effectively enhances the optimization ability of the algorithm. e procedure of DEHLO1 can be illustrated in Figure 1.

DEHLO2.
In real society, the same problem could be solved by using different approaches. But there might be a mainstream method in a certain period, and the mainstream method might be switched to another method due to the needs of the problem. Exactly as the way of human learning: "practice, knowledge, again practice, and again knowledge" [52], this form repeats itself in endless cycles, and with each cycle, the content of practice and knowledge rises to a higher level. is learning process is a vivid metaphor for the spiral. In DEHLO2, the HLO and the MBDE on the whole population are mixed and executed alternately by mimicking these learning behaviors. Firstly, the entire population adopts the HLO algorithm to search for the optimal solution. If it cannot be updated after a specified iteration, the learning process of HLO will be considered to encounter the (1) Initialize the population X randomly (2) Calculate the fitness of the whole population f(X) (3) Initialize the IKDs and SKD (4) while conditions on the stop criterion do (5) for i � 1 to N do (6) for j � 1 to M do (7) if (r≥0 and r≤pr) then (8) Generate xij as equation (2)  (9) else if (r > pr and r≤pi) then (10) Generate xij as equation (4)  (11) else if (r > pi and r < 1) then (12) Generate xij as equation (6)  (13) end if (14) end for (15) end for (16) Calculate the fitness function f(X) (17) Update the IKDs and SKD (18) end while ALGORITHM 1: Pseudocode of HLO. 4 Computational Intelligence and Neuroscience bottleneck; then the strategy of MBDE will be executed, which might make the algorithm escape from the bottleneck and vice versa: if the MBDE algorithm cannot find the optimal solution after certain iterations, the HLO algorithm will be executed to update the individual of the population. e flowchart of DEHLO2 is shown in Figure 2. e procedure of DEHLO2 can be described as follows: Step 1: Set control parameters, including the population size (popSize), the maximum generation (G max ), the iterations of the search strategy, and the control parameters of HLO and MBDE; Step 2: Initialize the population randomly, calculate the fitness of each individual, and initialize the IKD and SKD; Step 3: Update the individual of the population as equations (8)-(11) of the MBDE algorithm; when the global optimal of MBDE cannot update after the set iterations, use the HLO algorithm to update the individual of the population as equation (7), and so forth, to generate the new population; Step 4: Calculate the fitness of the new individual and update the IKD and SKD; Step 5: If the terminal conditions are met, terminate the iteration; otherwise go to step 3; Step 6: Output the optimal solution.   Computational Intelligence and Neuroscience erefore, the running time of each iterative step is ((3N + 1) × M + log(N × S × K N )). Assume that the maximum generation of DEHLOs algorithms is G, so the iterative search phase takes time G × ((3N + 1) × M + log(N × S × K N )). In general, the maximum generation G is much greater than N, K, and S, and therefore the time complexity of DEHLOs is

Experimental Results and Discussions
To verify the performance of the two algorithms, i.e., DEHLO1 and DEHLO2, the proposed DEHLOs as well as other six binary-coding optimization algorithms, i.e., Improved Adaptive Human Learning Optimization (IAHLO) [37], Simple Human Learning Optimization (SHLO) [34], Modified Binary Differential Evolution (MBDE) [47], Novel Binary Differential Evolution (NBDE) [53], Improved Binary Particle Swarm Optimization (IBPSO) [54], and Novel Binary Gaining Sharing Knowledge-based optimization (NBGSK) [17], were applied to solve multidimensional knapsack problems [55]. e parameters pr, pi, CR, F, and b adopt the default values of HLO and MBDE, and a set of fair parameters, i.e., Cn and K of DEHLO1 and NM and NH of DEHLO2, is chosen for DEHLO1 and DEHLO2 by trial and error in this paper, that is, Cn � 100, K � 5%, NM � 100, and NH � 50. For a fair comparison, the recommended parameters of all compared algorithms were used to tackle the problem, which is listed in where the binary decision variables x j are used to indicate whether the item j is included in the knapsack or not. Without loss of generality, knapsack problems assume that all profits and weights are positive and all the weights are smaller than the capacity C. Since the maximal volume of the knapsack is limited in knapsack problems and the total volume of the items packed in the knapsack may exceed the constraint, the violation is unacceptable and must be checked. us, the penalty function method as (13) is adopted to deal with the infeasible solutions, where the penalty coefficient β is a big constant which can lead the algorithm to escape from the infeasible area. For a comprehensive comparison, a total of 30 multidimensional knapsack problems (MKPs), i.e., the instances 5.250.00-29, are adopted to test the performance of DEHLOs as well as the other metaheuristics. e population size and the maximum generation of all the algorithms are set to 100 and 5000. Four indicators, i.e., the best fitness value (Best), the mean best fitness value (Mean), the worst fitness value (Worst), and the standard deviation (Std), are used to evaluate the performance of DEHLOs. Each algorithm ran 100 times on all the problems independently. e numerical results are given in Table 2.
To better compare the performance of DEHLOs with other algorithms, the results of student's t-test (t-test) and Wilcoxon signed-rank test (W-test) are also listed in Table 2 where "1" indicates that DEHLO2 is significantly better than the compared algorithms at the 95% confidence, "− 1" represents that DEHLO2 is significantly worse than the compared algorithms, and "0" denotes that the performance of DEHLO2 is equivalent to other algorithms. Note that the t-test, a parameter test, needs to satisfy the normality and homogeneity of variance, while the W-test, a nonparametric test, does not need. erefore, the t-test is more reliable when the Gaussian distribution assumption is met while the W-test would be more powerful when this assumption is violated [35]. For convenience, the results of the t-test and W-test are summarized in Table 3.  Tables 2 and 3, it is fair to say that DEHLO2 outperforms other algorithms on the multidimensional knapsack problems.

Another Set of Multidimensional Knapsack Problems.
To further verify the performance of the proposed algorithm, another set of multidimensional knapsack problems [53] is adopted as the test benchmark, which is listed in Table 4. e results of all algorithms on the MKPs are given in Table 5 where the best solutions have been highlighted in bold. And the summary results of the t-test and W-test are summarized in Table 6. To analyze the superiority of the proposed DEHLOs, the convergence curves of all algorithms on the MKPs are drawn in Figure 3.
It can be seen from Tables 5 and 6 and Figure 3 that DEHLO2 provides the best results and obtained the minimum error among the other algorithms. Specifically, DEHLO2 attains the best numerical results on 13 out of 14 instances and is only inferior to DEHLO1 on the instance 5.500.01. e summarized t-test and W-test results indicate that the proposed DEHLO2 significantly surpasses IAHLO, HLO, MBDE, NBDE, IBPSO, and NBGSK on all the instances while it is better than, competitive to, and worse than DEHLO1 on 10, 4, and 0 instances on the t-test and 11, 3, and 0 instances on the W-test, respectively. Furthermore, Figure 3 shows that the proposed DEHLOs algorithm has a faster convergence rate and higher solution accuracy than the compared algorithms. erefore, with the introduction of the strategy of MBDE, the optimization performance of the DEHLOs algorithm is significantly enhanced.  [34] pr � 5/M, pi � 0.85 + 2/M MBDE [47] CR Note. M is the dimension of solutions.  Computational Intelligence and Neuroscience 9       Computational Intelligence and Neuroscience 13

Conclusions and Future Work
Human learning optimization is a simplified model of human learning; it develops three learning operators, i.e. the random learning operator, the individual learning operator, and the social learning operator, to search for the optimal solution. However, the standard HLO just learns from the global optimal solution; this is inconsistent with reality. In real life, people can learn from the optimal solution of other individuals. And the operators of Differential Evolution (DE) are updated based on the optimal solution of other individuals. Inspired by this fact, this paper introduces the optimization strategy of MBDE into HLO and presents two novel differential human learning optimization algorithms based on individual and population. To comprehensively and fairly evaluate the performance of proposed algorithms, the multidimensional knapsack problems were adopted as the benchmark problems to test DEHLOs, as well as the standard HLO, MBDE, and other metaheuristics. e experimental results demonstrate that the proposed DEHLOs can utilize the learning ability of the two algorithms to search for the optimal solution more efficiently and have a robust search ability for different problems. It is well known that humans can adaptively choose and adjust these approaches to solve problems efficiently and effectively. However, the impact of adaptive learning strategy on algorithm parameters is not considered in this paper. erefore, one of our future works is to develop adaptive switching learning strategies to better release the power of different learning strategies for different problems, which will be very challenging for future work.

Data Availability
As the data also form part of an ongoing study, the raw/ processed data required to reproduce these findings cannot be shared at this time.  Computational Intelligence and Neuroscience 17